36 research outputs found

    MINTmap: fast and exhaustive profiling of nuclear and mitochondrial tRNA fragments from short RNA-seq data.

    Get PDF
    Transfer RNA fragments (tRFs) are an established class of constitutive regulatory molecules that arise from precursor and mature tRNAs. RNA deep sequencing (RNA-seq) has greatly facilitated the study of tRFs. However, the repeat nature of the tRNA templates and the idiosyncrasies of tRNA sequences necessitate the development and use of methodologies that differ markedly from those used to analyze RNA-seq data when studying microRNAs (miRNAs) or messenger RNAs (mRNAs). Here we present MINTmap (for MItochondrial and Nuclear TRF mapping), a method and a software package that was developed specifically for the quick, deterministic and exhaustive identification of tRFs in short RNA-seq datasets. In addition to identifying them, MINTmap is able to unambiguously calculate and report both raw and normalized abundances for the discovered tRFs. Furthermore, to ensure specificity, MINTmap identifies the subset of discovered tRFs that could be originating outside of tRNA space and flags them as candidate false positives. Our comparative analysis shows that MINTmap exhibits superior sensitivity and specificity to other available methods while also being exceptionally fast. The MINTmap codes are available through https://github.com/TJU-CMC-Org/MINTmap/ under an open source GNU GPL v3.0 license

    Machine Learning Approaches Identify Genes Containing Spatial Information From Single-Cell Transcriptomics Data.

    Get PDF
    The development of single-cell sequencing technologies has allowed researchers to gain important new knowledge about the expression profile of genes in thousands of individual cells of a model organism or tissue. A common disadvantage of this technology is the loss of the three-dimensional (3-D) structure of the cells. Consequently, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized the Single-Cell Transcriptomics Challenge, in which we participated, with the aim to address the following two problems: (a) to identify the top 60, 40, and 20 genes of the Drosophila melanogaster embryo that contain the most spatial information and (b) to reconstruct the 3-D arrangement of the embryo using information from those genes. We developed two independent techniques, leveraging machine learning models from least absolute shrinkage and selection operator (Lasso) and deep neural networks (NNs), which are applied to high-dimensional single-cell sequencing data in order to accurately identify genes that contain spatial information. Our first technique, Lasso.TopX, utilizes the Lasso and ranking statistics and allows a user to define a specific number of features they are interested in. The NN approach utilizes weak supervision for linear regression to accommodate for uncertain or probabilistic training labels. We show, individually for both techniques, that we are able to identify important, stable, and a user-defined number of genes containing the most spatial information. The results from both techniques achieve high performance when reconstructing spatial information in D. melanogaster and also generalize to zebrafish (Danio rerio). Furthermore, we identified novel D. melanogaster genes that carry important positional information and were not previously suspected. We also show how the indirect use of the full datasets’ information can lead to data leakage and generate bias in overestimating the model’s performance. Lastly, we discuss the applicability of our approaches to other feature selection problems outside the realm of single-cell sequencing and the importance of being able to handle probabilistic training labels. Our source code and detailed documentation are available at https://github.com/TJU-CMC-Org/SingleCell-DREAM/

    IsomiR Expression Profiles in Human Lymphoblastoid Cell Lines Exhibit Population and Gender Dependencies.

    Get PDF
    For many years it was believed that each mature microRNA (miRNA) existed as a single entity with fixed endpoints and a \u27static\u27 and unchangeable primary sequence. However, recent evidence suggests that mature miRNAs are more \u27dynamic\u27 and that each miRNA precursor arm gives rise to multiple isoforms, the isomiRs. Here we report on our identification of numerous and abundant isomiRs in the lymphoblastoid cell lines (LCLs) of 452 men and women from five different population groups. Unexpectedly, we find that these isomiRs exhibit an expression profile that is population-dependent and gender-dependent. This is important as it indicates that the LCLs of each gender/population combination have their own unique collection of mature miRNA transcripts. Moreover, each identified isomiR has its own characteristic abundance that remains consistent across biological replicates indicating that these are not degradation products. The primary sequences of identified isomiRs differ from the known miRBase miRNA either at their 5´-endpoint (leads to a different \u27seed\u27 sequence and suggests a different targetome), their 3´-endpoint, or both simultaneously. Our analysis of Argonaute PAR-CLIP data from LCLs supports the association of many of these newly identified isomiRs with the Argonaute silencing complex and thus their functional roles through participation in the RNA interference pathway

    YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.

    Get PDF
    Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes

    The biogenesis pathway of tRNA-derived piRNAs in Bombyx germ cells.

    Get PDF
    Transfer RNAs (tRNAs) function in translational machinery and further serves as a source of short non-coding RNAs (ncRNAs). tRNA-derived ncRNAs show differential expression profiles and play roles in many biological processes beyond translation. Molecular mechanisms that shape and regulate their expression profiles are largely unknown. Here, we report the mechanism of biogenesis for tRNA-derived Piwi-interacting RNAs (td-piRNAs) expressed in Bombyx BmN4 cells. In the cells, two cytoplasmic tRNA species, tRNAAspGUC and tRNAHisGUG, served as major sources for td-piRNAs, which were derived from the 5\u27-part of the respective tRNAs. cP-RNA-seq identified the two tRNAs as major substrates for the 5\u27-tRNA halves as well, suggesting a previously uncharacterized link between 5\u27-tRNA halves and td-piRNAs. An increase in levels of the 5\u27-tRNA halves, induced by BmNSun2 knockdown, enhanced the td-piRNA expression levels without quantitative change in mature tRNAs, indicating that 5\u27-tRNA halves, not mature tRNAs, are the direct precursors for td-piRNAs. For the generation of tRNAHisGUG-derived piRNAs, BmThg1l-mediated nucleotide addition to -1 position of tRNAHisGUG was required, revealing an important function of BmThg1l in piRNA biogenesis. Our study advances the understanding of biogenesis mechanisms and the genesis of specific expression profiles for tRNA-derived ncRNAs

    Knowledge about the presence or absence of miRNA isoforms (isomiRs) can successfully discriminate amongst 32 TCGA cancer types.

    Get PDF
    Isoforms of human miRNAs (isomiRs) are constitutively expressed with tissue- and disease-subtype-dependencies. We studied 10 271 tumor datasets from The Cancer Genome Atlas (TCGA) to evaluate whether isomiRs can distinguish amongst 32 TCGA cancers. Unlike previous approaches, we built a classifier that relied solely on \u27binarized\u27 isomiR profiles: each isomiR is simply labeled as \u27present\u27 or \u27absent\u27. The resulting classifier successfully labeled tumor datasets with an average sensitivity of 90% and a false discovery rate (FDR) of 3%, surpassing the performance of expression-based classification. The classifier maintained its power even after a 15× reduction in the number of isomiRs that were used for training. Notably, the classifier could correctly predict the cancer type in non-TCGA datasets from diverse platforms. Our analysis revealed that the most discriminatory isomiRs happen to also be differentially expressed between normal tissue and cancer. Even so, we find that these highly discriminating isomiRs have not been attracting the most research attention in the literature. Given their ability to successfully classify datasets from 32 cancers, isomiRs and our resulting \u27Pan-cancer Atlas\u27 of isomiR expression could serve as a suitable framework to explore novel cancer biomarkers

    Nuclear and mitochondrial tRNA-lookalikes in the human genome.

    Get PDF
    We are interested in identifying and characterizing loci of the human genome that harbor sequences resembling known mitochondrial and nuclear tRNAs. To this end, we used the known nuclear and mitochondrial tRNA genes (the tRNA-Reference set) to search for tRNA-lookalikes and found many such loci at different levels of sequence conservation. We find that the large majority of these tRNA-lookalikes resemble mitochondrial tRNAs and exhibit a skewed over-representation in favor of some mitochondrial anticodons. Our analysis shows that the tRNA-lookalikes have infiltrated specific chromosomes and are preferentially located in close proximity to known nuclear tRNAs (z-score ≤ -2.54, P-value ≤ 0.00394). Examination of the transcriptional potential of these tRNA-lookalike loci using public transcript annotations revealed that more than 20% of the lookalikes are transcribed as part of either known protein-coding pre-mRNAs, known lncRNAs, or known non-protein-coding RNAs, while public RNA-seq data perfectly agreed with the endpoints of tRNA-lookalikes. Interestingly, we found that tRNA-lookalikes are significantly depleted in known genetic variations associated with human health and disease whereas the known tRNAs are enriched in such variations. Lastly, a manual comparative analysis of the cloverleaf structure of several of the transcribed tRNA-lookalikes revealed no disruptive mutations suggesting the possibility that these loci give rise to functioning tRNA molecules

    Increasing cell density globally enhances the biogenesis of Piwi-interacting RNAs in Bombyx mori germ cells.

    Get PDF
    Piwi proteins and their bound Piwi-interacting RNAs (piRNAs) are predominantly expressed in the germline and play crucial roles in germline development by silencing transposons and other targets. Bombyx mori BmN4 cells are culturable germ cells that equip the piRNA pathway. Because of the scarcity of piRNA-expressing culturable cells, BmN4 cells are being utilized for the analyses of piRNA biogenesis. We here report that the piRNA biogenesis in BmN4 cells is regulated by cell density. As cell density increased, the abundance of Piwi proteins and piRNA biogenesis factors was commonly upregulated, resulting in an increased number of perinuclear nuage-like granules where Piwi proteins localize. Along with these phenomena, the abundance of mature piRNAs also globally increased, whereas levels of long piRNA precursor and transposons decreased, suggesting that increasing cell density promotes piRNA biogenesis pathway and that the resultant accumulation of mature piRNAs is functionally significant for transposon silencing. Our study reveals a previously uncharacterized link between cell density and piRNA biogenesis, designates cell density as a critical variable in piRNA studies using BmN4 cell system, and suggests the alteration of cell density as a useful tool to monitor piRNA biogenesis and function

    MINTbase v2.0: a comprehensive database for tRNA-derived fragments that includes nuclear and mitochondrial fragments from all The Cancer Genome Atlas projects.

    Get PDF
    MINTbase is a repository that comprises nuclear and mitochondrial tRNA-derived fragments (\u27tRFs\u27) found in multiple human tissues. The original version of MINTbase comprised tRFs obtained from 768 transcriptomic datasets. We used our deterministic and exhaustive tRF mining pipeline to process all of The Cancer Genome Atlas datasets (TCGA). We identified 23 413 tRFs with abundance of ≥ 1.0 reads-per-million (RPM). To facilitate further studies of tRFs by the community, we just released version 2.0 of MINTbase that contains information about 26 531 distinct human tRFs from 11 719 human datasets as of October 2017. Key new elements include: the ability to filter tRFs on-the-fly by minimum abundance thresholding; the ability to filter tRFs by tissue keywords; easy access to information about a tRF\u27s maximum abundance and the datasets that contain it; the ability to generate relative abundance plots for tRFs across cancer types and convert them into embeddable figures; MODOMICS information about modifications of the parental tRNA, etc. Version 2.0 of MINTbase contains 15x more datasets and nearly 4x more distinct tRFs than the original version, yet continues to offer fast, interactive access to its contents. Version 2.0 is available freely at http://cm.jefferson.edu/MINTbase/

    Beyond the one-locus-one-miRNA paradigm: microRNA isoforms enable deeper insights into breast cancer heterogeneity.

    Get PDF
    Here we describe our study of miRNA isoforms (isomiRs) in breast cancer (BRCA) and normal breast data sets from the Cancer Genome Atlas (TCGA) repository. We report that the full isomiR profiles, from both known and novel human-specific miRNA loci, are particularly rich in information and can distinguish tumor from normal tissue much better than the archetype miRNAs. IsomiR expression is also dependent on the patient\u27s race, exemplified by miR-183-5p, several isomiRs of which are upregulated in triple negative BRCA in white but not black women. Additionally, we find that an isomiR\u27s 5\u27 endpoint and length, but not the genomic origin, are key determinants of the regulation of its expression. Overexpression of distinct miR-183-5p isomiRs in MDA-MB-231 cells followed by microarray analysis revealed that each isomiR has a distinct impact on the cellular transcriptome. Parallel integrative analysis of mRNA expression from BRCA data sets of the TCGA repository demonstrated that isomiRs can distinguish between the luminal A and luminal B subtypes and explain in more depth the molecular differences between them than the archetype molecules. In conclusion, our findings provide evidence that post-transcriptional studies of BRCA will benefit from transcending the one-locus-one-miRNA paradigm and taking into account all isoforms from each miRNA locus as well as the patient\u27s race
    corecore